Recursive Array Layouts and Fast Matrix Multiplication

نویسندگان

  • Siddhartha Chatterjee
  • Alvin R. Lebeck
  • Praveen K. Patnala
  • Mithuna Thottethodi
چکیده

The performance of both serial and parallel implementations of matrix multiplication is highly sensitive to memory system behavior. False sharing and cache con icts cause traditional columnmajor or row-major array layouts to incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts to improve performance and reduce variability. Previous work on recursive matrix multiplication is extended to examine several recursive array layouts and three recursive algorithms: standard matrix multiplication, and the more complex algorithms of Strassen and Winograd. While recursive layouts signi cantly outperform traditional layouts (reducing execution times by a factor of 1.2{2.5) for the standard algorithm, they o er little improvement for Strassen's and Winograd's algorithms. For a purely sequential implementation, it is possible to reorder computation to conserve memory space and improve performance between 10% and 20%. Carrying the recursive layout down to the level of individual matrix elements is shown to be counter-productive; a combination of recursive layouts down to canonically ordered matrix tiles instead yields higher performance. Five recursive layouts with successively increasing complexity of address computation are evaluated, and it is shown that addressing overheads can be kept in control even for the most computationally demanding of these layouts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Non-canonical Array Layouts in Dense Matrix Operations

We present two implementations of dense matrix multiplication based on two different non-canonical array layouts: one based on a hypermatrix data structure (HM) where data submatrices are stored using a recursive layout; the other based on a simple block data layout with square blocks (SB) where blocks are arranged in column-major order. We show that the iterative code using SB outperforms a re...

متن کامل

Low Complexity and High speed in Leading DCD ERLS Algorithm

Adaptive algorithms lead to adjust the system coefficients based on the measured data. This paper presents a dichotomous coordinate descent method to reduce the computational complexity and to improve the tracking ability based on the variable forgetting factor when there are a lot of changes in the system. Vedic mathematics is used to implement the multiplier and the divider in the VFF equatio...

متن کامل

Recursion removal in fast matrix multiplication

Recursion’s removal improves the efficiency of recursive algorithms, especially algorithms with large formal parameters, such as fast matrix multiplication algorithms. In this article, a general method of breaking recursions in fast matrix multiplication algorithms is introduced, which is generalized from recursions removal of a specific fast matrix multiplication algorithm of Winograd.

متن کامل

Generic support of algorithmic and structural recursion for scientific computing

Recursive algorithms, like quick-sort, and recursive data structures, like trees, play a central role in programming. In the context of scientific computing, recursive algorithms and memory layouts are studied to provide excellent cache and TLB locality independently of the platform. We show how, for the first time, generic programming (GP) and OO allow us to abstract a multitude of dense-matri...

متن کامل

Fast recursive matrix multiplication for multi-core architectures

In this article, we present a fast algorithm for matrix multiplication optimized for recent multicore architectures. The implementation exploits different methodologies from parallel programming, like recursive decomposition, efficient low-level implementations of basic blocks, software prefetching, and task scheduling resulting in a multilevel algorithm with adaptive features. Measurements on ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Parallel Distrib. Syst.

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2002